I. Introduction

Amid the Coronavirus Disease pandemic in 2020, governments around the world developed a response to aid the citizens of their countries and mitigate the spread of the Severe Acute Respiratory Syndrome Coronavirus 2. This study aims to understand the relationship between the most recent cumulative number of confirmed cases of COVID-19 in different countries, per 10,000 individuals (on the 23rd of October 2020), with the past government responses in these countries to the outbreak (set on the 15th of June 2020). A model will be constructed to depict this relationship.

The data used in this study is obtained from The Humanitarian Data Exchange data portal and includes the total population for each country in 20191, the Stringency and Economic Support indices on the 15th of June 20202, and the cumulative number of confirmed cases of COVID-19 in different countries3 on the 23rd of October 2020. When comparing the number of infected individuals across countries, the population size of these countries need to be considered. So, the study will look at the cumulative number of confirmed cases of COVID-19 collected on the 23rd of October 2020 in different countries, per 10,000 individuals, and is calculated as: \(\frac{\text{cumulative cases in the country}}{\text{total population of the country}} \cdot 10,000\). The 23rd of October was chosen because it was the most recent date when this study was done.

The continuous variables Stringency Index and Economic Support Index are used to quantify the government response. The former index accounts for closure, containment, and public health measures, and the latter index accounts for the economic response taken by the governments. Two other indices, Government Response Index and Containment and Health Index, were considered instead of the Economic Support Index. However, 9/13 and 9/11 of the variables used to calculate the Government Response Index and Containment and Health Index respectively are in common with all 9 variables used to calculate the Stringency Index, suggesting the presence of high correlation between the indices, which is not ideal. On the other hand, the Stringency Index and the Economic Support Index are calculated with no features in common. 4 Note that the data used provides different government responses for different regions within certain countries, for example the United States of America. Since this study is looking at a country as a whole, the average government response of a country on the 15th of June 2020 will be used, by taking the average government response of all its regions on that day.

When deciding on a day to look at the government response, it was decided to choose a day after April 28 2020, since that is when the Stringency index and the Economic Support index were refined and expanded to give a more accurate measure for the government response.5 Also, the day had to be at least two weeks before the 23rd of October 2020, so that there is time for the response to take effect before looking at its effect on the number of infected individuals. Then, looking at some of the factors that are used to calculate the Stringency and Economic support indices - the income support6 and dept. or contract relief7, and international travel control8 respectively - the countries did not change their response to these variables at all or only slightly within May and July. Having some of the features that are included in the government response stay almost constant for a while allows the response to show its effect on the number of infected individuals more clearly, since the same response has been going on for a while versus looking at a response that changes within a week of its implementation. Thus, we took a day in the middle of the May to July interval: the 15th of June 2020.

After organizing the data, 167 countries remain represented in the dataset, out of the 195 countries in the world (approximately 85%)9.

Table 1.Sample for 5 randomly chosen countries of the data set used in this study
Country Stringency_Index Economic_Support_Index Population2019 cumulative_confirmed_cases_per_10000
Czech Republic 41.670 62.5 10669709 223.3641049
Dominica 72.220 75.0 71808 5.1526292
Romania 50.930 87.5 19356544 103.8573828
United Kingdom 71.668 100.0 66834405 124.7875252
Lao People’s Democratic Republic 36.110 62.5 7169455 0.0334753

II. Exploratory data analysis


Table 2: Summary for the cumulative confirmed cases per 10,000
n min median mean max sd
167 0.0334753 33.83375 70.11143 523.4503 94.78323

Our total sample size was 167 (Table 2). The mean cumulative confirmed cases (CCC) per 10,000 is about 70.11, far greater than our median 33.83, indicating that our CCC distribution is heavily right-skewed, which can easily be observed in Figure 1. This is to be expected for the lowest CCC possible is 0 whereas there is no such bound for the highest number. Most countries have their CCC within the 300-mark, we also notice the existence of some very extreme cases (outliers).

Figure 1. Distribution for the cumulative confirmed cases per 10,000 for individual countries

Figure 1. Distribution for the cumulative confirmed cases per 10,000 for individual countries

The distribution of the Stringency Index (Figure 2), which measures government response, seems to resemble a bell shape although there is a slight skew on the left tail. The Economic Support Index distribution (Figure 3), which records measures such as income support and debt relief, also seems to be a bit left-skewed. We notice that there are two modes at 50 and 75, but suspect that could be due to rounding.

Figure 2. Distribution for the government response measured by the Stringency Index

Figure 2. Distribution for the government response measured by the Stringency Index

Figure 3. Distribution for the government response measured by the Economic Support Index

Figure 3. Distribution for the government response measured by the Economic Support Index

In figure 4, the scatterplot shows that there seems so be some correlation between the cumulative confirmed cases per 10,000 (CCC) and the Stringency Index, which suggests that, without implying any causal effect, countries with a higher number of cases per 10,000 tend to also have strict policies on pandemic response. It is worth noting that there exist a few outliers (we consider those that pass the 400-mark of CCC) that might have more influence on the best fit line. We also notice that for the cases of (almost) 0 CCC for many countries, the response (Stringency Index) diverses the most (from 0 to 100) compared to other levels, with more points clustering in the [50,75] range. This diversity is also true for Economic Support, which suggests that countries with very low CCC also spend a variety amount on income support and debt relief packages. However, countries that have more CCC definitely tend to spend more on said packages.

Figure 4. Interactive Scatterplot for the cumulative confirmed cases per 10,000 for individual countries against their government response measured by the Stringency Index. The red line is the best fit line. The blue curve is the Loess curve.

The scatter plot in Figure 5 for the CCC against Economic Support Index has more points on the bottom and fewer at the top. This implies that countries with lower cases per 10,000 individuals tend to spend less on economic relief packages.

Figure 5. Interactive Scatterplot for the cumulative confirmed cases per 10,000 for individual countries against their government response measured by the Economic Support Index. The red line is the best fit line. The blue curve is the Loess curve.


III. Multiple linear regression

i. Methods


Our initial model is the following:

\[ \begin{aligned}\widehat{Y}_{CCPTTH} =& b_{0} + b_{SI} \cdot (x_1) + b_{ESI} \cdot (x_2) \\ = & -31.3037 + 0.7567 \cdot (x_1) + 1.0102 \cdot (x_2) \end{aligned} \]

Our group intended to use a linear model on the given data, then performed a residual analysis, as an in-sample validation method, to detect any systematic departure from the assumptions upon which the model is built: normality, independence, and homoscedasticity of the residuals. In Figure 6, we are presented with a normal QQ-plot of the residuals, which plots the theoretical quantiles against their observed sample counterparts. The graph presents an upward curve, implying that our data is heavily right-skewed. This is confirmed in Figure 7, showing the histogram of the error terms.

Figure 6. Normal Q-Qplot for the model under discussion

Figure 6. Normal Q-Qplot for the model under discussion

Figure 7. Residuals distribution for the statistical model

Figure 7. Residuals distribution for the statistical model

Not only that, Figures 8, 9 and 10 present a fanning-out pattern of the residuals, implying that the variance is non-constant, or heteroscedasticity.

Figure 8. Residuals graph for the fitted values, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 8. Residuals graph for the fitted values, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 9. Residuals graph for the Stringency Index, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 9. Residuals graph for the Stringency Index, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 10. Residuals graph for the Economic Support Index, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 10. Residuals graph for the Economic Support Index, with a Lowess curve in blue and a horizontal line at zero in red.

Due to the violation of the normality and homoscedasticity assumption mentioned above, we recognize that a transformation is much needed. Using the method of log-likelihood (Figure 11), our dependent variable (CCC) will be transformed by the factor of 0.1818. This factor is positive, thus should not be altering the direction of correlation in our inference later on. Note that in Table 2, our min is 0.033 (and not 0), hence our transformation is valid without having to leave out any y value for any country.

Figure 11. Graph resulting from a Box Cox Test

Figure 11. Graph resulting from a Box Cox Test

Comparing the residual graphs (Figure 12 to 16) of the transformed data with what we started with, we observe that the distribution of error terms is fixed to more bell-shaped, the normal Q-Q plot shows an almost straight line, and the residual scatter plot is cloud-shaped (the residuals for Economic support Index is more spread-out). We may conclude that the transformation has allowed our assumptions about the model to be reasonably met in order to proceed with our analysis.

Figure 12. Normal QQplot for the transformed model

Figure 12. Normal QQplot for the transformed model

Figure 13. Residuals distribution for the transformed statistical model

Figure 13. Residuals distribution for the transformed statistical model

Figure 14. Residuals against the fitted values of the transformed model, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 14. Residuals against the fitted values of the transformed model, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 15. Residuals graph for the Stringency Index after the transformation, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 15. Residuals graph for the Stringency Index after the transformation, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 16. Residuals graph for the Economic Support Index after the transformation, with a Lowess curve in blue and a horizontal line at zero in red.

Figure 16. Residuals graph for the Economic Support Index after the transformation, with a Lowess curve in blue and a horizontal line at zero in red.

To ensure that multicollinearity is not a problem in the transformed model, the VIF values were calculated for the variables in the transformed model. It was found that there is little to no multicollinearity, so the study will proceed with the chosen model transformation.

##       Stringency_Index Economic_Support_Index 
##                1.00014                1.00014

ii. Model Results


Table 3. Model Summary Table

## 
## Call:
## lm(formula = cumulative_confirmed_cases_per_10000_transf ~ Stringency_Index + 
##     Economic_Support_Index, data = tidy_joined_dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.29303 -0.36540 -0.01137  0.39995  1.21433 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            1.017725   0.163390   6.229 3.81e-09 ***
## Stringency_Index       0.006525   0.002146   3.040  0.00275 ** 
## Economic_Support_Index 0.007814   0.001499   5.213 5.54e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5374 on 164 degrees of freedom
## Multiple R-squared:  0.1802, Adjusted R-squared:  0.1702 
## F-statistic: 18.02 on 2 and 164 DF,  p-value: 8.405e-08

Table 4. The 95% Confidence Intervals

2.5 % 97.5 %
(Intercept) 0.6951053 1.3403447
Stringency_Index 0.0022874 0.0107619
Economic_Support_Index 0.0048542 0.0107742

iii. Interpreting the regression table

Our model is the following:

\[ \begin{aligned}\widehat{Y}_{CCPTTH}^{0.182} =& b_{0,t} + b_{SI,t} \cdot (x_1) + b_{ESI,t} \cdot (x_2) \\ = & 1.017725 + 0.006525 \cdot (x_1) + 0.007814 \cdot (x_2) \end{aligned} \]

+ The intercept (\(b_{0}\) = 1.017725) represents the average number of cases per 10000 to the power of 0.1818182, given a 0 on both the Stringency Index and the Economic Support Index.

+ The slope estimation for the Stringency Index indicates the rate of change of the average number of cases per 10000 to the power of 0.1818182 increasing by 0.006525 units with every unit increase of the Stringency Index, given the Economic Support Index being equal.

+ The slope estimation for the Economic Support Index indicates that the average number of cases per 10000 to the power of 0.1818182 increases by 0.007814 units with every unit increase of the Economic Support Index, given the Stringency Index being equal.

+ We find our adjusted R-squared to be 0.1702 which is low but does demonstrate some explanation for the observation variability in comparison to no predictor variables at all, given the p-value 8.405e-08 and an F-statistic of 18.02 on 2 and 164 DF. This seems to tell us that it is better than only the mean of confirmed cases per 10000 transformed to the power of 0.18, but it would be better to add more explanatory variables to our model to explain more variability.

iv. Inference for multiple regression

Using our confidence intervals table (Table 4) output, we are going to test different null hypothesis.

\[\begin{aligned} H_0:&\beta_{0,t} = 0 \\\ \mbox{vs }H_A:& \beta_{0,t} \neq 0 \end{aligned}\]

For the intercept in the transformed model, we find the 95% confidence intervals for it to be [0.6951053, 1.3403447] indicating that it is implausible to be zero at a 95% confidence level. We can also see that the p-value is small at 3.81e-09 which means we can reject the null hypothesis that the intercept is 0 for the alternate hypothesis that it is non-zero and positive. In context, the intercept makes sense, since a country can choose to not give any economic support nor take closure, containment, and public health measures (for the Stringency index), while still having a positive number of cumulative infected individuals by Covid-19.

\[\begin{aligned} H_0:&\beta_{SI,t} = 0 \\\ \mbox{vs }H_A:& \beta_{SI,t} \neq 0 \end{aligned}\]

For the Stringency Index, we find the 95% confidence interval for the rate of change is [0.0022874, 0.0107619] indicating that it is implausible to be zero at a 95% confidence level. We can also see that the p-value is small at 0.00275 which means we can reject the null hypothesis that the slope is 0 for the alternate hypothesis that it is non-zero and positive.

\[\begin{aligned} H_0:&\beta_{ESI,t} = 0 \\\ \mbox{vs }H_A:& \beta_{ESI,t} \neq 0 \end{aligned}\]

For the Economic Support Index, we find the 95% confidence interval for the rate of change is [0.0048542, 0.0107742] indicating that the slope is plausibly positive at a 95% confidence level. We can also see that the p-value is very small at 0.0000006 which means we can reject the null hypothesis that the slope is 0 for the alternate hypothesis that it is non-zero and positive.

The next research question we want to explore is: Is the transformed cumulative cases significantly related to the Stringency Index given nothing in the model? From the ANOVA table, there is sufficient evidence (F=8.873995 , P<0.01) to conclude that the Stringency Index is significantly related to the transformed cumulative cases given nothing in the model.

Table 5. ANOVA table for the transformed model
Df Sum Sq Mean Sq F value Pr(>F)
Stringency_Index 1 2.563214 2.5632141 8.873995 0.0033322
Economic_Support_Index 1 7.848528 7.8485278 27.172056 0.0000006
Residuals 164 47.370672 0.2888456 NA NA

The 95% Prediction intervals for Stringency Index; for example, a country with a Stringency Index equals to 20, Economic Support Index equal to 50, and transformed cumulative confirmed cases per 10,000 equals to 1.2. The cumulative cases per 10,000 is predicted to be between 0.01378 and 199.3965.

It is similar to other Stringency indices 50,70,90 in the prediction intervals table. In other words, any country with Stringency as 50,70 and 90 (and Economic Support Index equal to 50, transformed cumulative confirmed cases per 10,000 equals to 1.2), the cumulative cases are predicted between the lower and upper band in table 6.

Table 6. The 95% Prediction intervals where Stringency Index = 20, 50, 70, 90, respectively, for transformed cumulative confirmed cases per 10,000 = 1.2, and economic support index = 50.

SI Point Estimate Lower Limit Upper Limit
20 10.70778 0.01378 199.3965
50 20.68657 0.10941 288.2952
70 30.82751 0.29381 369.6069
90 44.71648 0.65209 474.4893

The 95% Prediction intervals for Economic Support Index; for example, a country with a Stringency Index equals 75, Economic Support Index equal to 25, and transformed cumulative confirmed cases per 10,000 equals to 1.2. The cumulative cases per 10,000 is predicted to be between 0.08134 and 272.0472.

It is similar to other Economic Support indices 50,70,100 in the prediction intervals table. In other words, any country with Economic Support as 50,75 and 100 (and Stringency Index equals 75, transformed cumulative confirmed cases per 10,000 equals to 1.2), the cumulative cases are predicted between the lower and upper band in the table 7.

Table 7. The 95% Prediction intervals where Economic Support Index = 25, 50, 75, 100, respectively, for transformed cumulative confirmed cases per 10,000 = 1.2, and Stringency index = 75.

ESI Point Estimate Lower Limit Upper Limit
25 18.65860 0.08134 272.0472
50 33.91222 0.36412 393.3872
75 58.12835 1.14900 560.8008
100 94.95592 2.90336 789.0344

IV. Discussion

i. Conclusions

Our analysis shows that there seems to be some relationship between the total confirmed cases per 10,000 and the Stringency and Economic Support Indices of a country measured with a time-lag of 130 days. We see evidence to suggest that CCC is positively correlated with Stringency and Economic Support Index (in that specific order added to the model), which aligns with our expectation, for it is reasonable for a government to respond strictly and spend more budget on income support packages if their people are more impacted by the pandemic.

ii. Limitations

The sample was not properly adjusted to account for the missing countries. Even though it represents approximately 85% of all the countries in the World, it fails to represent groups of countries properly, for example by continent or socio-economic regions. Also, the model was not validated using another sample, so its adequacy can also be questioned.

A common limitation when it comes to unorganized data is the way in which the data is recorded and categorized. The data sets used in this study for the confirmed cases of COVID-19 and the indices do not include data for every country registered in the World Bank (the population data set has countries only registered in the World Bank). For example, the Maldives does not have an entry in the data set for the Stringency and Economic Support indices, when it does have an entry in the other two data sets. Using data that was not gathered for the specific purpose of this study is a limitation since inconsistencies such as these are inevitable.

Other non-linear models, such as higher degree polynomial regression models, were considered. It was decided to go with the simpler model to avoid overfitting the data and avoid unnecessarily over complicating the analysis.

iii. Further questions

The model can be greatly improved and become more helpful if other predictor variables are added to it. Variables that were not used to calculate the Stringency and Economic Support Indices could be looked into, since they will probably not have a strong correlation with the indices already in the model. Moreover, a step function can be explored due to the natural breaks in the Economic Support Index, and polynomial regression models of higher degree can be explored to try and explain more variability in the data.

Lastly, the change in the cumulative number of confirmed cases per 10,000, from the day the response was implemented to the most recent date, can be looked at instead of the cumulative number of confirmed cases per 10,000 upto the most recent date.


V. Citations and References


  1. “Total Population” World Bank Indicators of Interest to the COVID-19 Outbreak. COVID-19 Pandemic. _ World Bank_. United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed October 2020 https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases↩︎

  2. “OxCGRT_CSV” OXFORD COVID-19 Government Response Stringency index, COVID-19 Pandemic. The Oxford COVID-19 Government Response Tracker. United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed October 2020 https://data.humdata.org/dataset/oxford-covid-19-government-response-tracker↩︎

  3. “time_series_covid19_confirmed_global.csv” Novel Coronavirus (COVID-19) Cases Data. COVID-19 Pandemic. Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE). United Nations Office for the Coordination of Humanitarian Affairs. 2020. Accessed October 2020 https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases↩︎

  4. “Methodology for calculating indices” OXFORD COVID-19 Government Response Stringency index, COVID-19 Pandemic, Index methodology version 3.1. The Oxford COVID-19 Government Response Tracker. United Nations Office for the Coordination of Humanitarian Affairs. 25 May 2020. Accessed October 2020 https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/index_methodology.md↩︎

  5. “What’s Changed?” OXFORD COVID-19 Government Response Stringency index, COVID-19 Pandemic. The Oxford COVID-19 Government Response Tracker. United Nations Office for the Coordination of Humanitarian Affairs. 28 April 2020. Accessed October 2020 https://www.bsg.ox.ac.uk/sites/default/files/OxCGRT.%20What%27s%20changed%2024%20April%202020.pdf↩︎

  6. “Income support during the COVID-19 pandemic” Coronavirus pandemic, Our World in Data, 2020. Accessed October 2020 https://ourworldindata.org/grapher/income-support-covid?time=2020-06-19↩︎

  7. “Dept or contract relief during the COVID-19 pandemic” Coronavirus pandemic, Our World in Data, 2020. Accessed October 2020 https://ourworldindata.org/grapher/debt-relief-covid?time=2020-06-26↩︎

  8. “International travel controls during the COVID-19 pandemic” Coronavirus pandemic, Our World in Data, 2020. Accessed October 2020 https://ourworldindata.org/grapher/international-travel-covid?time=2020-06-23↩︎

  9. “How many Countries are there in the World?”, Worldometer, 2020. Accessed October 2020 https://www.worldometers.info/geography/how-many-countries-are-there-in-the-world/↩︎